Avoiding overfitting of multilayer perceptrons by training derivatives
نویسنده
چکیده
© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Abstract—Resistance to overfitting is observed for neural networks trained with extended backpropagation algorithm. In addition to target values, its cost function uses derivatives of those up to the 4th order. For common applications of neural networks, high order derivatives are not readily available, so simpler cases are considered: training network to approximate analytical function inside 2D and 5D domains and solving Poisson equation inside a 2D circle. For function approximation, the cost is a sum of squared differences between output and target as well as their derivatives with respect to the input. Differential equations are usually solved by putting a multilayer perceptron in place of unknown function and training its weights, so that equation holds within some margin of error. Commonly used cost is the equation’s residual squared. Added terms are squared derivatives of said residual with respect to the independent variables. To investigate overfitting, the cost is minimized for points of regular grids with various spacing, and its root mean is compared with its value on much denser test set. Fully connected perceptrons with six hidden layers and 2 · 10, 1 · 10 and 5 · 10 weights in total are trained with Rprop until cost changes by less than 10% for last 1000 epochs, or when the 10000th epoch is reached. Training the network with 5 · 10 weights to represent simple 2D function using 10 points with 8 extra derivatives in each produces cost test to train ratio of 1.5, whereas for classical backpropagation in comparable conditions this ratio is 2 · 10.
منابع مشابه
Avoiding overfitting in multilayer perceptrons with feeling-of-knowing using self-organizing maps.
Overfitting in multilayer perceptron (MLP) training is a serious problem. The purpose of this study is to avoid overfitting in on-line learning. To overcome the overfitting problem, we have investigated feeling-of-knowing (FOK) using self-organizing maps (SOMs). We propose MLPs with FOK using the SOMs method to overcome the overfitting problem. In this method, the learning process advances acco...
متن کاملDeep Big Multilayer Perceptrons for Digit Recognition
The competitive MNIST handwritten digit recognition benchmark has a long history of broken records since 1998. The most recent advancement by others dates back 8 years (error rate 0.4%). Good old on-line back-propagation for plain multi-layer perceptrons yields a very low 0.35% error rate on the MNIST handwritten digits benchmark with a single MLP and 0.31% with a committee of seven MLP. All we...
متن کاملOn overfitting, generalization, and randomly expanded training sets
An algorithmic procedure is developed for the random expansion of a given training set to combat overfitting and improve the generalization ability of backpropagation trained multilayer perceptrons (MLPs). The training set is K-means clustered and locally most entropic colored Gaussian joint input-output probability density function (pdf) estimates are formed per cluster. The number of clusters...
متن کاملQuantile regression with multilayer perceptrons
We consider nonlinear quantile regression involving multilayer perceptrons (MLP). In this paper we investigate the asymptotic behavior of quantile regression in a general framework. First by allowing possibly non-identifiable regression models like MLP's with redundant hidden units, then by relaxing the conditions on the density of the noise. In this paper, we present an universal bound for the...
متن کاملDeep, Big, Simple Neural Nets for Handwritten Digit Recognition
Good old online backpropagation for plain multilayer perceptrons yields a very low 0.35% error rate on the MNIST handwritten digits benchmark. All we need to achieve this best result so far are many hidden layers, many neurons per layer, numerous deformed training images to avoid overfitting, and graphics cards to greatly speed up learning.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1802.10301 شماره
صفحات -
تاریخ انتشار 2018